Skip to main content

Watsonx.data Presto

Summary

This document covers the information to gather from IBM Watsonx.data in order to configure a Qarbine data service. The data service will use the Qarbine Watson Presto driver. You can define multiple data services that access the same Watsonx.data endpoint though with varying credentials. Once a data service is defined, you can manage which Qarbine principals have access to it and its associated data. A Qarbine administrator has visibility to all data services.

Watsonx.data Configuration

Overview

For Qarbine to access your Watsonx.data content it needs at least the following information:

  • cloud resource name (CRN),
  • Region, and
  • API key.

Information about the IBM Watsonx.data Presto based service can be found at https://prestodb.io/ and
https://developer.ibm.com/articles/awb-introduction-to-presto-a-query-engine-powering-watsonx-data/

The CRN and API key values are used to obtain a token from the service endpoint at
https://iam.cloud.ibm.com/identity/token

Sign on to your IBM account at https://cloud.ibm.com/.

You can also sign in by navigating to https://www.ibm.com/products/watsonx-data and clicking “Log in” in the upper right as shown.

  

Another option is to navigate to
https://dataplatform.cloud.ibm.com/registration/stepone

Choose the region

  

Click

  

Sign in and navigate if necessary to the Watsonx.data page. This went over to the AI world.
https://dataplatform.cloud.ibm.com/wx/home

Here is another way. Navigate to https://www.ibm.com/products/watsonx-data and
click “start free trial”. Fill in the form again and click Continue

Obtaining the CRN and Region

A sample starting point which has the region in the URL is https://us-south.lakehouse.cloud.ibm.com.

In the bottom left navigate to the option highlighted below.

  

A portion of the dialog is shown below.

  

Copy the region and CRN values into a temporary area.

Obtaining an API Key

Information on API keys can be found at
https://cloud.ibm.com/docs/account?topic=account-manapikey&code=curl

Information on creating an API key can be found at
https://cloud.ibm.com/docs/account?topic=account-iamtoken_from_apikey&code=curl

Navigate to the WatonX.data access page using the highlighted option below.

  

Click on the red highlighted sidebar option.

  

Click

  

Enter a name

  

Click

  

Click the highlighted link to copy the API key.

  

Click the dialog

  

Paste the API key in a temporary spot.

Qarbine Configuration

Overview

The Qarbine administrator can configure a Qarbine data service in several manners.

Data Service Configuration Affect of Listed “Databases” in the Query Tools
No specific catalog and databaseThe list contains a list based on all of the catalog names and their contained schema.The format of the list elements is “catalog.schema”.
A specific catalogThe list contains the names of the Presto schema within the specified catalog.
A specific catalog and database.The list has a single entry of the specified database (AKA Presto schema).

You can obtain the Presto catalog and schema values from the Watsonx.data console.

Compute Node Preparation

Determine which compute node service endpoint you want to run this data access from. That URL will go into the Data Service’s Compute URL field. Its form is “https://domain:port/dispatch”. A sample is shown below.

  

The port number corresponds to a named service endpoint configured on the given target host. For example, the primary compute node usually is set to have a ‘main’ service. That service’s configuration is defined in the ˜./qarbine.service/config/service.main.json file. Inside that file the following driver entry is required

"drivers" :[
. . .
"./driver/ibmWatsonxDriver.js"
]

The relevant configuration file name for non primary (main) Qarbine compute nodes is service.NAME.json. Remember to have well formed JSON syntax or a startup error is likely to occur. If you end up adding that entry then restart the service via the general command line syntax

pm2 restart <service>

For example,

pm2 restart main

or simply

pm2 restart all

Data Service Definition

Open the Administration Tool.

Navigate to the Data Services tab.

  

A data service defines on what compute node a query will run by default along with the means to reach to target data. The latter includes which native driver to use along with settings corresponding to that driver. Multiple Data Sources can reference a single Data Service. The details of any one Data Service are thus maintained in one spot and not spread out all over the place in each Data Source. The latter is a maintenance and support nightmare.

To begin adding a data service click

  

On the right hand side enter a name and optionally a description.

  

Set the Compute URL field based on the identified compute node above. Its form is “https://domain:port/dispatch”. A sample is shown below.

  

Also choose the driver shown below.

  

The server template is the CRN value you copied above. An example is shown below.

  

You can reference environment variables using the syntax %NAME%. Any strings should be quoted and the key\value pairs separated by commas.

The server options include your API key copied from above and the region. These values are quoted and separated by a comma as shown below.

  

Test your settings by clicking on the toolbar image highlighted below.

  

The result should be similar to the following.

  

In the above example the prestoEngines node has been expanded.

Save the Data Service by clicking on the image highlighted below.

  

The data service will be known at the next log on time. Next, see the Presto query interaction and any tutorial for information on interacting with Watsonx.data from Qarbine.

Pausing Watsonx.data Services

Useful information on managing resources can be found at
https://dataplatform.cloud.ibm.com/docs/content/wsj/admin/monitor-resources.html?context=wx

To limit resource unit (RU) consumption in your account consider pausing the engines. Navigate to the page shown below.

  

Notice the engines and other components as shown below,.

  

Clock the pause icon

  

The component display will change to pause as shown below.

  

Sample Watsonx.data Content

Overview

There is an option to use Watsonx provided sample data. A sample page shown after signing on to Watsonx is https://us-south.lakehouse.cloud.ibm.com/. Your region and URL may be different.

Information on the sample data can be found at https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-tutorial_hp_intro.

The data can be seen by navigating as shown below.

  

Note - There are several DBA oriented reports for the catalogs, schemas, and tables within a Watsonx Data configuration. See the “DBA Productivity” section of the documentation web site http://doc.qarbine.com.

Sample Catalogs

Shown below are the sample catalogs.

  

Sample Schemas

Expanding the sample_data catalog shows the following schemas.

  

Expanding the finTech schema shows the following tables.

  

GoSales Tables

Expanding the gosales schema shows the 27 tables within it.

    

The order_header table looks like the following.

  

The order_details table looks like the following.

  

The product_name_lookup table looks like the following.

  

FinTech Tables

Expanding the fintech schema shows the following tables.